Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts.

نویسندگان

  • Sekimizu
  • Park
  • Tsujii
چکیده

We have selected the most frequently seen verbs from raw texts made up of 1-million-words of Medline abstracts, and we were able to identify (or bracket) noun phrases contained in the corpus, with a precision rate of 90%. Then, based on the noun-phrase-bracketted corpus, we tried to find the subject and object terms for some frequently seen verbs in the domain. The precision rate of finding the right subject and object for each verb was about 73%. This task was only made possible because we were able to linguistically analyze (or parse) a large quantity of a raw corpus. Our approach will be useful for classifying genes and gene products and for identifying the interaction between them. It is the first step of our effort in building a genome-related thesaurus and hierarchies in a fully automatic way.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using the Protein-protein Interaction Network to Identifying the Biomarkers in Evolution of the Oocyte

Background Oocyte maturity includes nuclear and cytoplasmic maturity, both of which are important for embryo fertilization. The development of oocyte is not limited to the period of follicular growth, and starts from the embryonic period and continues throughout life. In this study, for the purpose of evaluating the effect of the FSH hormone on the expression of genes, GEO access codes for this...

متن کامل

Network-based transcriptome analysis in salt tolerant and salt sensitive maize (Zea mays L.) genotypes

Identification of genes involved in salinity stress tolerance provides deeper insight into molecular mechanisms underlying salinity tolerance in maize. The present study was conducted in the faculty of agriculture of Urmia university, Iran, in 2018, with the aim of identifying genetic differences between two maize genotypes in tolerance to salinity stress, and the results of gene expression wer...

متن کامل

Functional analysis of Subject and Verb in Theses Abstracts on Applied Linguistics

The purpose of the present study is to analyse abstracts related to Applied Linguistics, and more precisely the discourse functions of grammatical subjects and verbs. The corpus consisted of 50 PhD thesis abstracts written on the subject of Applied Linguistics. All of the abstracts were written from 2010 to 2014. The theses from which the abstracts were extracted are available in the ProQuest d...

متن کامل

Detecting Multiword Verbs in the English Sublanguage of MEDLINE Abstracts

In this paper, we investigate the multiword verbs in the English sublanguage of MEDLINE abstracts. Based on the integration of the domain-specific named entity knowledge and syntactic as well as statistical information, this work mainly focuses on how to evaluate a proper multiword verb candidate. Our results present a sound balance between the lowand high-frequency multiword verb candidates in...

متن کامل

Comparative experiments on learning information extractors for proteins and their interactions

OBJECTIVE Automatically extracting information from biomedical text holds the promise of easily consolidating large amounts of biological knowledge in computer-accessible form. This strategy is particularly attractive for extracting data relevant to genes of the human genome from the 11 million abstracts in Medline. However, extraction efforts have been frustrated by the lack of conventions for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Genome informatics. Workshop on Genome Informatics

دوره 9  شماره 

صفحات  -

تاریخ انتشار 1998